UIMA Ruta: Rapid development of rule-based information extraction applications

نویسندگان

Peter Klügl

Martin Toepfer

Philip-Daniel Beck

Georg Fette

Frank Puppe

چکیده

Rule-based information extraction is an important approach for processing the increasingly available amount of unstructured data. The manual creation of rule-based applications is a time-consuming and tedious task, which requires qualified knowledge engineers. The costs of this process can be reduced by providing a suitable rule language and extensive tooling support. This paper presents UIMA Ruta, a tool for rule-based information extraction and text processing applications. The system was designed with focus on rapid development. The rule language and its matching paradigm facilitate the quick specification of comprehensible extraction knowledge. They support a compact representation while still providing a high level of expressiveness. These advantages are supplemented by the development environment UIMA Ruta Workbench. It provides, in addition to extensive editing support, essential assistance for explanation of rule execution, introspection, automatic validation, and rule induction. UIMA Ruta is a useful tool for academia and industry due to its open source license. We compare UIMA Ruta to related rule-based systems especially concerning the compactness of the rule representation, the expressiveness, and the provided tooling support. The competitiveness of the runtime performance is shown in relation to a popular and freelyavailable system. A selection of case studies implemented with UIMA Ruta illustrates the usefulness of the system in real-world scenarios.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TextMarker: A Tool for Rule-Based Information Extraction

This paper presents TEXTMARKER– a powerful toolkit for rule-based information extraction. TEXTMARKER is based on UIMA and provides versatile information processing and advanced extraction techniques. We thoroughly describe the system and its capabilities for human-like information processing and rapid prototyping of information extraction applications.

متن کامل

Constraint-driven Evaluation in UIMA Ruta

This paper presents an extension of the UIMA Ruta Workbench for estimating the quality of arbitrary information extraction models on unseen documents. The user can specify expectations on the domain in the form of constraints, which are applied in order to predict the F1 score or the ranking. The applicability of the tool is illustrated in a case study for the segmentation of references, which ...

متن کامل

Integrated Tools for Query-driven Development of Light-weight Ontologies and Information Extraction Components

This paper reports on a user-friendly terminology and information extraction development environment that integrates into existing infrastructure for natural language processing and aims to close a gap in the UIMA community. The tool supports domain experts in data-driven and manual terminology refinement and refactoring. It can propose new concepts and simple relations and includes an informat...

متن کامل

An Interface for Rapid Natural Language Processing Development in UIMA

This demonstration presents the Annotation Librarian, an application programming interface that supports rapid development of natural language processing (NLP) projects built in Apache Unstructured Information Management Architecture (UIMA). The flexibility of UIMA to support all types of unstructured data – images, audio, and text – increases the complexity of some of the most common NLP devel...

متن کامل

Combination of Rule-based and Machine Learning for Biomedical Event Extraction

This paper describes the method for biomedical event extraction. The biomedical events occurs in relative to biomedical concepts (objects) as proteins, genes. In this work, we try a hybrid method to identify given event types relative to a given set of proteins in biomedical text. The approach combines rule-based and machine learning. A Set of rules is built based on event triggers, and a set o...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Natural Language Engineering

دوره 22 شماره

صفحات -

تاریخ انتشار 2016

UIMA Ruta: Rapid development of rule-based information extraction applications

نویسندگان

چکیده

منابع مشابه

TextMarker: A Tool for Rule-Based Information Extraction

Constraint-driven Evaluation in UIMA Ruta

Integrated Tools for Query-driven Development of Light-weight Ontologies and Information Extraction Components

An Interface for Rapid Natural Language Processing Development in UIMA

Combination of Rule-based and Machine Learning for Biomedical Event Extraction

عنوان ژورنال:

اشتراک گذاری